![]() | ![]() | ![]() | ![]() |
In SAS® Enterprise Miner™ High-Performance Procedures documentation, the chapter The HPFOREST Procedure contains an example called "Missing Values and Imputed Values". The stated purpose of the example is as-follows:
This example uses the Home Equity data from the SAS sample library to illustrate the difference between using missing values and using imputed values.
However, the conclusion that a difference exists is based on a false premise. You can ignore the example.
DETAILS
The DATA step code that creates the imputed table imout does not output the correct imputed values for those variables that were imputed by the PROC HPIMPUTE invocation. The incorrect values occur because the OUT= data set that is created by PROC IMPUTE does not preserve the data order of the DATA= data set when the procedure is running in multi-threaded (the default) mode. There is no BY statement with the MERGE statement in the DATA step. Therefore, the merged observations are not correctly matched.
The only way to guarantee data order is by using a PERFORMANCE statement with the NTHREADS=1 option in the PROC HPIMPUTE invocation.
If the MERGE is done correctly, then the variable-importance ranking and miss-classification rates are similar with and without imputation. In that case, the premise "imputing variables reduce the predictive power of the variables" is no longer valid.
Product Family | Product | System | Product Release | SAS Release | ||
Reported | Fixed* | Reported | Fixed* | |||
SAS System | SAS High-Performance Data Mining | Microsoft® Windows® for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |
64-bit Enabled AIX | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 | ||
64-bit Enabled Solaris | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 | ||
Linux for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 | ||
Solaris for x64 | 12.2 | 15.1 | 9.3 TS1M2 | 9.4 TS1M6 |